{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial: Plugging User-Designed Methods into DANCE 2.0 for Auto-Search\n", "\n", "In this notebook, we'll walk through how to integrate a new algorithm (specifically an SVM classifier) into the auto-search framework outlined in your documentation. We will:\n", "\n", "1. Inherit from the **BaseClassificationMethod** (or another suitable base) to define our custom method. Implement the required interfaces (`fit`, `predict`, and optionally `preprocessing_pipeline`). \n", "2. Show how to run the hyperparameter search using the integrated method. \n", "3. Provide an example `main.py`-like script that demonstrates how the auto-search process is orchestrated.\n", "\n", "## 1. Folder Structure & Requirements\n", "\n", "Before diving in, ensure you have the following directory structure (at least conceptually; your actual project structure can be more extensive):" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```\n", "examples/tuning/\n", "└── classification_svm/\n", " ├── main.py\n", " ├── tutorial.ipynb \n", " └── dataset_name/\n", " ├── pipeline_params_tuning_config.yaml\n", " └── config_yamls/\n", " ├── 0_test_acc_params_tuning_config.yaml\n", " ├── 1_test_acc_params_tuning_config.yaml\n", " └── 2_test_acc_params_tuning_config.yaml\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Where `cta_svm` is the directory we created for our new algorithm. The same pattern can apply for other methods, such as `clustering_kmeans`, `regression_linreg`, etc.\n", "\n", "We'll focus on the **SVM** example below.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Defining Our SVM Classifier\n", "\n", "Suppose we want to define a custom SVM method for classification.\n", "We'll inherit from BaseClassificationMethod and implement the required methods." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/zyxing/dance/dance/utils/matrix.py:178: NumbaExperimentalFeatureWarning: First-class function type feature is experimental\n", " for j in numba.prange(n):\n", "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/numba/np/ufunc/parallel.py:371: NumbaWarning: The TBB threading layer requires TBB version 2021 update 6 or later i.e., TBB_INTERFACE_VERSION >= 12060. Found TBB_INTERFACE_VERSION = 12050. The TBB threading layer is disabled.\n", " warnings.warn(problem)\n" ] } ], "source": [ "from typing import Optional\n", "from dance.modules.base import BaseClassificationMethod\n", "from sklearn.svm import SVC\n", "import numpy as np\n", "\n", "from dance.transforms.cell_feature import WeightedFeaturePCA\n", "from dance.transforms.misc import Compose, SetConfig\n", "from dance.typing import LogLevel\n", "\n", "class SVM(BaseClassificationMethod):\n", " \"\"\"The SVM cell-type classification model.\n", "\n", " Parameters\n", " ----------\n", " args : argparse.Namespace\n", " A Namespace contains arguments of SVM. See parser help document for more info.\n", " prj_path: str\n", " project path\n", "\n", " \"\"\"\n", "\n", " def __init__(self, args, prj_path=\"./\", random_state: Optional[int] = None):\n", " self.args = args\n", " self.random_state = random_state\n", " self._mdl = SVC(random_state=random_state, probability=True)\n", "\n", " @staticmethod\n", " def preprocessing_pipeline(n_components: int = 400, log_level: LogLevel = \"INFO\"):\n", " return Compose(\n", " WeightedFeaturePCA(n_components=n_components, split_name=\"train\"),\n", " SetConfig({\n", " \"feature_channel\": \"WeightedFeaturePCA\",\n", " \"label_channel\": \"cell_type\"\n", " }),\n", " log_level=log_level,\n", " )\n", "\n", " def fit(self, x: np.ndarray, y: np.ndarray):\n", " \"\"\"Train the classifier.\n", "\n", " Parameters\n", " ----------\n", " x\n", " Training cell features.\n", " y\n", " Training labels.\n", "\n", " \"\"\"\n", " self._mdl.fit(x, y)\n", "\n", " def predict(self, x: np.ndarray):\n", " \"\"\"Predict cell labels.\n", "\n", " Parameters\n", " ----------\n", " x\n", " Samples to be predicted (samplex x features).\n", "\n", " Returns\n", " -------\n", " y\n", " Predicted labels of the input samples.\n", "\n", " \"\"\"\n", " return self._mdl.predict(x)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Example `main.py` File\n", "\n", "Below is an example of how your `main.py` might look if you're adding SVM as one of the classification methods. This file orchestrates the entire pipeline:\n", "\n", "1. **Register** preprocessing functions through annotations (optional)\n", "2. **Parsing Arguments** and configuring hyperparameters. \n", "3. **Defining** an evaluation function that: \n", " - Loads and preprocesses the data. \n", " - Initializes your model (the new SVM class). \n", " - Trains and scores the model. \n", " - Logs results to Weights & Biases (wandb). \n", "4. **Running** the hyperparameter sweep agent (e.g., via `wandb_sweep_agent`). \n", "5. **Saving** results and optionally generating a second-stage tuning config file.\n", "\n", "> **Note**: For demonstration, only relevant code is shown. Adjust as needed for your exact pipeline or data." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "\"\"\" \n", "Step 1: preprocessing functions can be registered using register_preprocessor. \n", "In this example, the GaussRandProjFeature preprocessing function is registered within the feature.cell pipeline. \n", "This registered function can later be specified in the configuration file.\n", "\"\"\"\n", "from sklearn.random_projection import GaussianRandomProjection\n", "from dance.registry import register_preprocessor\n", "from dance.transforms.base import BaseTransform\n", "\n", "\n", "@register_preprocessor(\"feature\", \"cell\",overwrite=True) # NOTE: register any custom preprocessing function to be used for tuning\n", "class GaussRandProjFeature(BaseTransform):\n", " \"\"\"Custom preprocessing to extract cell feature via Gaussian random projection.\"\"\"\n", "\n", " _DISPLAY_ATTRS = (\"n_components\", \"eps\")\n", "\n", " def __init__(self, n_components: int = 400, eps: float = 0.1, **kwargs):\n", " super().__init__(**kwargs)\n", " self.n_components = n_components\n", " self.eps = eps\n", "\n", " def __call__(self, data):\n", " feat = data.get_feature(return_type=\"numpy\")\n", " grp = GaussianRandomProjection(n_components=self.n_components, eps=self.eps)\n", "\n", " self.logger.info(f\"Start generateing cell feature via Gaussian random projection (d={self.n_components}).\")\n", " data.data.obsm[self.out] = grp.fit_transform(feat)\n", "\n", " return data\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[INFO][2025-08-20 12:25:15,393][dance][main] \n", " files is saved in /home/zyxing/dance/examples/tuning/custom-methods/328_138\n", "[INFO][2025-08-20 12:25:15,411][dance][config] tune mode is set to pipeline_params, tune_mode will first be converted to pipeline\n", "Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.\n", "[INFO][2025-08-20 12:25:17,271][dance][wandb_sweep] \u001b[94m\n", "\n", "\t[*] Sweep ID: 15layk3y\n", "\u001b[0m\n", "[INFO][2025-08-20 12:25:17,272][dance][wandb_sweep_agent] Spawning agent: sweep_id='15layk3y', entity='xzy11632', project='dance-dev', count=2\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Create sweep with ID: 15layk3y\n", "Sweep URL: https://wandb.ai/xzy11632/dance-dev/sweeps/15layk3y\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[34m\u001b[1mwandb\u001b[0m: Agent Starting Run: dduxrl3d with config:\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tpipeline.0.filter.gene: FilterGenesPercentile\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tpipeline.1.normalize: ColumnSumNormalize\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tpipeline.2.filter.gene: FilterGenesRegression\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tpipeline.3.feature.cell: CellPCA\n", "Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.\n", "\u001b[34m\u001b[1mwandb\u001b[0m: Currently logged in as: \u001b[33mxzy11632\u001b[0m. Use \u001b[1m`wandb login --relogin`\u001b[0m to force relogin\n" ] }, { "data": { "text/html": [ "wandb version 0.21.1 is available! To upgrade, please run:\n", " $ pip install wandb --upgrade" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Tracking run with wandb version 0.16.3" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Run data is saved locally in /home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122520-dduxrl3d" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Syncing run rural-sweep-1 to Weights & Biases (docs)
Sweep page: https://wandb.ai/xzy11632/dance-dev/sweeps/15layk3y" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View project at https://wandb.ai/xzy11632/dance-dev" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View sweep at https://wandb.ai/xzy11632/dance-dev/sweeps/15layk3y" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View run at https://wandb.ai/xzy11632/dance-dev/runs/dduxrl3d" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "[INFO][2025-08-20 12:25:32,694][dance][set_seed] Setting global random seed to 10\n", "[INFO][2025-08-20 12:25:32,696][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv\n", "[INFO][2025-08-20 12:25:32,983][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv\n", "[INFO][2025-08-20 12:25:33,104][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv\n", "[INFO][2025-08-20 12:25:33,107][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv\n", "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.\n", " warnings.warn(\n", "[INFO][2025-08-20 12:25:33,354][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088\n", "[INFO][2025-08-20 12:25:33,355][dance][_load_raw_data] Number of training samples: 262\n", "[INFO][2025-08-20 12:25:33,356][dance][_load_raw_data] Number of valid samples: 66\n", "[INFO][2025-08-20 12:25:33,357][dance][_load_raw_data] Number of testing samples: 138\n", "[INFO][2025-08-20 12:25:33,357][dance][_load_raw_data] Cell-types (n=9):\n", "['OPC',\n", " 'astrocytes',\n", " 'endothelial',\n", " 'fetal_quiescent',\n", " 'fetal_replicating',\n", " 'hybrid',\n", " 'microglia',\n", " 'neurons',\n", " 'oligodendrocytes']\n", "[INFO][2025-08-20 12:25:33,360][dance][load_data] Raw data loaded:\n", "Data object that wraps (.data):\n", "AnnData object with n_obs × n_vars = 466 × 22088\n", " uns: 'dance_config'\n", " obsm: 'cell_type'\n", "[INFO][2025-08-20 12:25:33,361][dance][wrapped_func] Took 0:00:00.665608 to load and process data.\n", "[INFO][2025-08-20 12:25:33,361][dance][generate_config] The content in pipeline_params will be converted to pipeline\n", "[INFO][2025-08-20 12:25:33,363][dance][_sanitize_pipeline] Pipeline plan:\n", "\u001b[92m['FilterGenesPercentile',\n", " 'ColumnSumNormalize',\n", " 'FilterGenesRegression',\n", " 'CellPCA',\n", " None]\u001b[0m\n", "[WARNING][2025-08-20 12:25:33,370][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data\n", "[WARNING][2025-08-20 12:25:33,376][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data\n", "/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.\n", " warnings.warn(\"Expecting count data as input, but the input feature matrix does not appear to be count.\"\n", "[INFO][2025-08-20 12:25:33,618][dance][_filter_enclasc] Start generating cell features using EnClaSC\n", "[WARNING][2025-08-20 12:25:33,641][dance.CellPCA][__call__] n_components=400 must be between 0 and min(n_samples, n_features)=100 with svd_solver='auto'\n", "[INFO][2025-08-20 12:25:33,652][dance.CellPCA][__call__] Generating cell PCA features (466, 100) (k=100)\n", "[INFO][2025-08-20 12:25:33,654][dance.CellPCA][__call__] Top 10 explained variances: [0.11390967 0.07235937 0.05432951 0.04682069 0.04452541 0.03371136\n", " 0.02961524 0.0270735 0.025507 0.02293849]\n", "[INFO][2025-08-20 12:25:33,655][dance.CellPCA][__call__] Total explained variance: 100.00%\n", "[INFO][2025-08-20 12:25:33,657][dance][set_config_from_dict] Setting config 'feature_channel' to 'feature.cell'\n", "[INFO][2025-08-20 12:25:33,658][dance][set_config_from_dict] Setting config 'label_channel' to 'cell_type'\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "e6ac2024dd3940f589dfb94f09bd833e", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\\r'), FloatProgress(value=1.0, max=1.0)))" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "

Run history:


acc
test_acc
train_acc

Run summary:


acc0.51515
test_acc0.10145
train_acc0.65649

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View run rural-sweep-1 at: https://wandb.ai/xzy11632/dance-dev/runs/dduxrl3d
Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Find logs at: ./wandb/run-20250820_122520-dduxrl3d/logs" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[34m\u001b[1mwandb\u001b[0m: Agent Starting Run: nxe1pfd6 with config:\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tpipeline.0.filter.gene: FilterGenesPercentile\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tpipeline.1.normalize: ColumnSumNormalize\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tpipeline.2.filter.gene: FilterGenesRegression\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tpipeline.3.feature.cell: CellSVD\n", "Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.\n" ] }, { "data": { "text/html": [ "wandb version 0.21.1 is available! To upgrade, please run:\n", " $ pip install wandb --upgrade" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Tracking run with wandb version 0.16.3" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Run data is saved locally in /home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122547-nxe1pfd6" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Syncing run olive-sweep-2 to Weights & Biases (docs)
Sweep page: https://wandb.ai/xzy11632/dance-dev/sweeps/15layk3y" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View project at https://wandb.ai/xzy11632/dance-dev" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View sweep at https://wandb.ai/xzy11632/dance-dev/sweeps/15layk3y" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View run at https://wandb.ai/xzy11632/dance-dev/runs/nxe1pfd6" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "[INFO][2025-08-20 12:25:58,365][dance][set_seed] Setting global random seed to 10\n", "[INFO][2025-08-20 12:25:58,368][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv\n", "[INFO][2025-08-20 12:25:58,770][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv\n", "[INFO][2025-08-20 12:25:58,914][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv\n", "[INFO][2025-08-20 12:25:58,919][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv\n", "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.\n", " warnings.warn(\n", "[INFO][2025-08-20 12:25:59,103][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088\n", "[INFO][2025-08-20 12:25:59,104][dance][_load_raw_data] Number of training samples: 262\n", "[INFO][2025-08-20 12:25:59,105][dance][_load_raw_data] Number of valid samples: 66\n", "[INFO][2025-08-20 12:25:59,106][dance][_load_raw_data] Number of testing samples: 138\n", "[INFO][2025-08-20 12:25:59,107][dance][_load_raw_data] Cell-types (n=9):\n", "['OPC',\n", " 'astrocytes',\n", " 'endothelial',\n", " 'fetal_quiescent',\n", " 'fetal_replicating',\n", " 'hybrid',\n", " 'microglia',\n", " 'neurons',\n", " 'oligodendrocytes']\n", "[INFO][2025-08-20 12:25:59,110][dance][load_data] Raw data loaded:\n", "Data object that wraps (.data):\n", "AnnData object with n_obs × n_vars = 466 × 22088\n", " uns: 'dance_config'\n", " obsm: 'cell_type'\n", "[INFO][2025-08-20 12:25:59,111][dance][wrapped_func] Took 0:00:00.743655 to load and process data.\n", "[INFO][2025-08-20 12:25:59,111][dance][generate_config] The content in pipeline_params will be converted to pipeline\n", "[INFO][2025-08-20 12:25:59,114][dance][_sanitize_pipeline] Pipeline plan:\n", "\u001b[92m['FilterGenesPercentile',\n", " 'ColumnSumNormalize',\n", " 'FilterGenesRegression',\n", " 'CellSVD',\n", " None]\u001b[0m\n", "[WARNING][2025-08-20 12:25:59,121][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data\n", "[WARNING][2025-08-20 12:25:59,127][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data\n", "/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.\n", " warnings.warn(\"Expecting count data as input, but the input feature matrix does not appear to be count.\"\n", "[INFO][2025-08-20 12:25:59,320][dance][_filter_enclasc] Start generating cell features using EnClaSC\n", "[WARNING][2025-08-20 12:25:59,340][dance.CellSVD][__call__] n_components=400 must be between 0 and min(n_samples, n_features)=100 with svd_solver='full'\n", "[INFO][2025-08-20 12:25:59,388][dance.CellSVD][__call__] Generating cell SVD features (466, 100) (k=100)\n", "[INFO][2025-08-20 12:25:59,389][dance.CellSVD][__call__] Top 10 explained variances: [0.0475235 0.10532387 0.06999493 0.05225454 0.04628773 0.03516576\n", " 0.03369351 0.02756703 0.02625577 0.02297754]\n", "[INFO][2025-08-20 12:25:59,390][dance.CellSVD][__call__] Total explained variance: 100.00%\n", "[INFO][2025-08-20 12:25:59,391][dance][set_config_from_dict] Setting config 'feature_channel' to 'feature.cell'\n", "[INFO][2025-08-20 12:25:59,392][dance][set_config_from_dict] Setting config 'label_channel' to 'cell_type'\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "f1bbc28a9b174491931df5bffdad81bf", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\\r'), FloatProgress(value=1.0, max=1.0)))" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "

Run history:


acc
test_acc
train_acc

Run summary:


acc0.5
test_acc0.0942
train_acc0.64122

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View run olive-sweep-2 at: https://wandb.ai/xzy11632/dance-dev/runs/nxe1pfd6
Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Find logs at: ./wandb/run-20250820_122547-nxe1pfd6/logs" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "[INFO][2025-08-20 12:26:09,505][dance][wandb_sweep] \u001b[94m\n", "\n", "\t[*] Sweep ID: 1f9pschy\n", "\u001b[0m\n", "[INFO][2025-08-20 12:26:09,506][dance][wandb_sweep_agent] Spawning agent: sweep_id='1f9pschy', entity='xzy11632', project='dance-dev', count=2\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Create sweep with ID: 1f9pschy\n", "Sweep URL: https://wandb.ai/xzy11632/dance-dev/sweeps/1f9pschy\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[34m\u001b[1mwandb\u001b[0m: Agent Starting Run: 69ew4oa2 with config:\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.max_val: 98\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.min_val: 8\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.mode: rv\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.1.ColumnSumNormalize.eps: 0.7\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.1.ColumnSumNormalize.mode: minmax\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.2.FilterGenesRegression.method: scmap\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.2.FilterGenesRegression.num_genes: 5388\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.3.CellPCA.n_components: 227\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.3.CellPCA.svd_solver: arpack\n", "Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.\n" ] }, { "data": { "text/html": [ "wandb version 0.21.1 is available! To upgrade, please run:\n", " $ pip install wandb --upgrade" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Tracking run with wandb version 0.16.3" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Run data is saved locally in /home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122613-69ew4oa2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Syncing run morning-sweep-1 to Weights & Biases (docs)
Sweep page: https://wandb.ai/xzy11632/dance-dev/sweeps/1f9pschy" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View project at https://wandb.ai/xzy11632/dance-dev" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View sweep at https://wandb.ai/xzy11632/dance-dev/sweeps/1f9pschy" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View run at https://wandb.ai/xzy11632/dance-dev/runs/69ew4oa2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "[INFO][2025-08-20 12:26:24,459][dance][set_seed] Setting global random seed to 10\n", "[INFO][2025-08-20 12:26:24,461][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv\n", "[INFO][2025-08-20 12:26:24,854][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv\n", "[INFO][2025-08-20 12:26:24,996][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv\n", "[INFO][2025-08-20 12:26:25,000][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv\n", "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.\n", " warnings.warn(\n", "[INFO][2025-08-20 12:26:25,171][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088\n", "[INFO][2025-08-20 12:26:25,173][dance][_load_raw_data] Number of training samples: 262\n", "[INFO][2025-08-20 12:26:25,173][dance][_load_raw_data] Number of valid samples: 66\n", "[INFO][2025-08-20 12:26:25,175][dance][_load_raw_data] Number of testing samples: 138\n", "[INFO][2025-08-20 12:26:25,175][dance][_load_raw_data] Cell-types (n=9):\n", "['OPC',\n", " 'astrocytes',\n", " 'endothelial',\n", " 'fetal_quiescent',\n", " 'fetal_replicating',\n", " 'hybrid',\n", " 'microglia',\n", " 'neurons',\n", " 'oligodendrocytes']\n", "[INFO][2025-08-20 12:26:25,179][dance][load_data] Raw data loaded:\n", "Data object that wraps (.data):\n", "AnnData object with n_obs × n_vars = 466 × 22088\n", " uns: 'dance_config'\n", " obsm: 'cell_type'\n", "[INFO][2025-08-20 12:26:25,180][dance][wrapped_func] Took 0:00:00.720475 to load and process data.\n", "[INFO][2025-08-20 12:26:25,182][dance][_sanitize_params] Params plan:\n", "\u001b[92m[{'max_val': 98, 'min_val': 8, 'mode': 'rv'},\n", " {'eps': 0.7, 'mode': 'minmax'},\n", " {'method': 'scmap', 'num_genes': 5388},\n", " {'n_components': 227, 'svd_solver': 'arpack'},\n", " None]\u001b[0m\n", "[WARNING][2025-08-20 12:26:25,193][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data\n", "[WARNING][2025-08-20 12:26:25,200][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data\n", "/home/zyxing/dance/dance/transforms/filter.py:490: RuntimeWarning: invalid value encountered in divide\n", " gene_summary = np.nan_to_num(np.array(x.var(0) / x.mean(0)), posinf=0, neginf=0).ravel()\n", "/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.\n", " warnings.warn(\"Expecting count data as input, but the input feature matrix does not appear to be count.\"\n", "[INFO][2025-08-20 12:26:25,452][dance][_filter_scmap] Start generating cell features using scmap\n", "[INFO][2025-08-20 12:26:26,328][dance.CellPCA][__call__] Generating cell PCA features (466, 5388) (k=227)\n", "[INFO][2025-08-20 12:26:26,330][dance.CellPCA][__call__] Top 10 explained variances: [0.05885801 0.0215941 0.01331656 0.01141509 0.00957707 0.00813536\n", " 0.00765066 0.00701284 0.00689292 0.00644914]\n", "[INFO][2025-08-20 12:26:26,331][dance.CellPCA][__call__] Total explained variance: 76.84%\n", "[INFO][2025-08-20 12:26:26,332][dance][set_config_from_dict] Setting config 'feature_channel' to 'feature.cell'\n", "[INFO][2025-08-20 12:26:26,333][dance][set_config_from_dict] Setting config 'label_channel' to 'cell_type'\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "5e22aa93f5844c9a8e659c22be9c608a", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\\r'), FloatProgress(value=1.0, max=1.0)))" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "

Run history:


acc
test_acc
train_acc

Run summary:


acc0.71212
test_acc0.36232
train_acc0.89695

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View run morning-sweep-1 at: https://wandb.ai/xzy11632/dance-dev/runs/69ew4oa2
Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Find logs at: ./wandb/run-20250820_122613-69ew4oa2/logs" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[34m\u001b[1mwandb\u001b[0m: Agent Starting Run: h8j0oo74 with config:\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.max_val: 96\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.min_val: 1\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.mode: cv\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.1.ColumnSumNormalize.eps: 0.3\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.1.ColumnSumNormalize.mode: standardize\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.2.FilterGenesRegression.method: seurat3\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.2.FilterGenesRegression.num_genes: 9419\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.3.CellPCA.n_components: 636\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.3.CellPCA.svd_solver: arpack\n", "Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.\n" ] }, { "data": { "text/html": [ "wandb version 0.21.1 is available! To upgrade, please run:\n", " $ pip install wandb --upgrade" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Tracking run with wandb version 0.16.3" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Run data is saved locally in /home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122640-h8j0oo74" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Syncing run earnest-sweep-2 to Weights & Biases (docs)
Sweep page: https://wandb.ai/xzy11632/dance-dev/sweeps/1f9pschy" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View project at https://wandb.ai/xzy11632/dance-dev" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View sweep at https://wandb.ai/xzy11632/dance-dev/sweeps/1f9pschy" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View run at https://wandb.ai/xzy11632/dance-dev/runs/h8j0oo74" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "[INFO][2025-08-20 12:26:51,507][dance][set_seed] Setting global random seed to 10\n", "[INFO][2025-08-20 12:26:51,512][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv\n", "[INFO][2025-08-20 12:26:51,877][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv\n", "[INFO][2025-08-20 12:26:52,016][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv\n", "[INFO][2025-08-20 12:26:52,020][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv\n", "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.\n", " warnings.warn(\n", "[INFO][2025-08-20 12:26:52,175][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088\n", "[INFO][2025-08-20 12:26:52,177][dance][_load_raw_data] Number of training samples: 262\n", "[INFO][2025-08-20 12:26:52,177][dance][_load_raw_data] Number of valid samples: 66\n", "[INFO][2025-08-20 12:26:52,178][dance][_load_raw_data] Number of testing samples: 138\n", "[INFO][2025-08-20 12:26:52,179][dance][_load_raw_data] Cell-types (n=9):\n", "['OPC',\n", " 'astrocytes',\n", " 'endothelial',\n", " 'fetal_quiescent',\n", " 'fetal_replicating',\n", " 'hybrid',\n", " 'microglia',\n", " 'neurons',\n", " 'oligodendrocytes']\n", "[INFO][2025-08-20 12:26:52,181][dance][load_data] Raw data loaded:\n", "Data object that wraps (.data):\n", "AnnData object with n_obs × n_vars = 466 × 22088\n", " uns: 'dance_config'\n", " obsm: 'cell_type'\n", "[INFO][2025-08-20 12:26:52,182][dance][wrapped_func] Took 0:00:00.670366 to load and process data.\n", "[INFO][2025-08-20 12:26:52,183][dance][_sanitize_params] Params plan:\n", "\u001b[92m[{'max_val': 96, 'min_val': 1, 'mode': 'cv'},\n", " {'eps': 0.3, 'mode': 'standardize'},\n", " {'method': 'seurat3', 'num_genes': 9419},\n", " {'n_components': 636, 'svd_solver': 'arpack'},\n", " None]\u001b[0m\n", "[WARNING][2025-08-20 12:26:52,190][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data\n", "[WARNING][2025-08-20 12:26:52,195][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data\n", "/home/zyxing/dance/dance/transforms/filter.py:488: RuntimeWarning: invalid value encountered in divide\n", " gene_summary = np.nan_to_num(np.array(x.std(0) / x.mean(0)), posinf=0, neginf=0).ravel()\n", "/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.\n", " warnings.warn(\"Expecting count data as input, but the input feature matrix does not appear to be count.\"\n", "[INFO][2025-08-20 12:26:52,411][dance][_filter_seurat3] Start generating cell features using Seurat v3.0\n", "[WARNING][2025-08-20 12:26:52,460][dance.CellPCA][__call__] n_components=636 must be between 0 and min(n_samples, n_features)=466 with svd_solver='arpack'\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "0df7caa8625e48b3abad4562f20e1e9d", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\\r'), FloatProgress(value=1.0, max=1.0)))" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View run earnest-sweep-2 at: https://wandb.ai/xzy11632/dance-dev/runs/h8j0oo74
Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Find logs at: ./wandb/run-20250820_122640-h8j0oo74/logs" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "Run h8j0oo74 errored:\n", "Traceback (most recent call last):\n", " File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/wandb/agents/pyagent.py\", line 308, in _run_job\n", " self._function()\n", " File \"/tmp/ipykernel_715844/3979844991.py\", line 88, in evaluate_pipeline\n", " preprocessing_pipeline(data)\n", " File \"/home/zyxing/dance/dance/pipeline.py\", line 128, in __call__\n", " return self.functional(*args, **kwargs)\n", " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", " File \"/home/zyxing/dance/dance/pipeline.py\", line 247, in bounded_functional\n", " a(*args, **kwargs)\n", " File \"/home/zyxing/dance/dance/pipeline.py\", line 128, in __call__\n", " return self.functional(*args, **kwargs)\n", " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", " File \"/home/zyxing/dance/dance/utils/wrappers.py\", line 128, in new_call\n", " return original_call(self, data, *args, **kwargs)\n", " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", " File \"/home/zyxing/dance/dance/transforms/cell_feature.py\", line 177, in __call__\n", " cell_feat = pca.fit_transform(feat)\n", " ^^^^^^^^^^^^^^^^^^^^^^^\n", " File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/utils/_set_output.py\", line 157, in wrapped\n", " data_to_wrap = f(self, X, *args, **kwargs)\n", " ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", " File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/base.py\", line 1152, in wrapper\n", " return fit_method(estimator, *args, **kwargs)\n", " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", " File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py\", line 460, in fit_transform\n", " U, S, Vt = self._fit(X)\n", " ^^^^^^^^^^^^\n", " File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py\", line 512, in _fit\n", " return self._fit_truncated(X, n_components, self._fit_svd_solver)\n", " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", " File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py\", line 592, in _fit_truncated\n", " raise ValueError(\n", "ValueError: n_components=466 must be strictly less than min(n_samples, n_features)=466 with svd_solver='arpack'\n", "\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m Run h8j0oo74 errored:\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m Traceback (most recent call last):\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/wandb/agents/pyagent.py\", line 308, in _run_job\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m self._function()\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/tmp/ipykernel_715844/3979844991.py\", line 88, in evaluate_pipeline\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m preprocessing_pipeline(data)\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/home/zyxing/dance/dance/pipeline.py\", line 128, in __call__\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m return self.functional(*args, **kwargs)\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/home/zyxing/dance/dance/pipeline.py\", line 247, in bounded_functional\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m a(*args, **kwargs)\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/home/zyxing/dance/dance/pipeline.py\", line 128, in __call__\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m return self.functional(*args, **kwargs)\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/home/zyxing/dance/dance/utils/wrappers.py\", line 128, in new_call\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m return original_call(self, data, *args, **kwargs)\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/home/zyxing/dance/dance/transforms/cell_feature.py\", line 177, in __call__\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m cell_feat = pca.fit_transform(feat)\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m ^^^^^^^^^^^^^^^^^^^^^^^\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/utils/_set_output.py\", line 157, in wrapped\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m data_to_wrap = f(self, X, *args, **kwargs)\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/base.py\", line 1152, in wrapper\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m return fit_method(estimator, *args, **kwargs)\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py\", line 460, in fit_transform\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m U, S, Vt = self._fit(X)\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m ^^^^^^^^^^^^\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py\", line 512, in _fit\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m return self._fit_truncated(X, n_components, self._fit_svd_solver)\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py\", line 592, in _fit_truncated\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m raise ValueError(\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m ValueError: n_components=466 must be strictly less than min(n_samples, n_features)=466 with svd_solver='arpack'\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m \n", "[INFO][2025-08-20 12:27:02,289][dance][wandb_sweep] \u001b[94m\n", "\n", "\t[*] Sweep ID: cyuki0fw\n", "\u001b[0m\n", "[INFO][2025-08-20 12:27:02,289][dance][wandb_sweep_agent] Spawning agent: sweep_id='cyuki0fw', entity='xzy11632', project='dance-dev', count=2\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Create sweep with ID: cyuki0fw\n", "Sweep URL: https://wandb.ai/xzy11632/dance-dev/sweeps/cyuki0fw\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[34m\u001b[1mwandb\u001b[0m: Agent Starting Run: jhryorbj with config:\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.max_val: 98\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.min_val: 4\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.mode: sum\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.1.ColumnSumNormalize.eps: 0.1\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.1.ColumnSumNormalize.mode: minmax\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.2.FilterGenesRegression.method: scmap\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.2.FilterGenesRegression.num_genes: 7435\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.3.CellSVD.algorithm: arpack\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.3.CellSVD.n_components: 793\n", "Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "2a3d88f01aae494eba0f1f231adc5ab2", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Label(value='Waiting for wandb.init()...\\r'), FloatProgress(value=0.011112662653128305, max=1.0…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "wandb version 0.21.1 is available! To upgrade, please run:\n", " $ pip install wandb --upgrade" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Tracking run with wandb version 0.16.3" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Run data is saved locally in /home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122705-jhryorbj" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Syncing run clear-sweep-1 to Weights & Biases (docs)
Sweep page: https://wandb.ai/xzy11632/dance-dev/sweeps/cyuki0fw" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View project at https://wandb.ai/xzy11632/dance-dev" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View sweep at https://wandb.ai/xzy11632/dance-dev/sweeps/cyuki0fw" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View run at https://wandb.ai/xzy11632/dance-dev/runs/jhryorbj" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "[INFO][2025-08-20 12:27:16,705][dance][set_seed] Setting global random seed to 10\n", "[INFO][2025-08-20 12:27:16,707][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv\n", "[INFO][2025-08-20 12:27:17,099][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv\n", "[INFO][2025-08-20 12:27:17,243][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv\n", "[INFO][2025-08-20 12:27:17,247][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv\n", "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.\n", " warnings.warn(\n", "[INFO][2025-08-20 12:27:17,424][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088\n", "[INFO][2025-08-20 12:27:17,425][dance][_load_raw_data] Number of training samples: 262\n", "[INFO][2025-08-20 12:27:17,426][dance][_load_raw_data] Number of valid samples: 66\n", "[INFO][2025-08-20 12:27:17,427][dance][_load_raw_data] Number of testing samples: 138\n", "[INFO][2025-08-20 12:27:17,428][dance][_load_raw_data] Cell-types (n=9):\n", "['OPC',\n", " 'astrocytes',\n", " 'endothelial',\n", " 'fetal_quiescent',\n", " 'fetal_replicating',\n", " 'hybrid',\n", " 'microglia',\n", " 'neurons',\n", " 'oligodendrocytes']\n", "[INFO][2025-08-20 12:27:17,430][dance][load_data] Raw data loaded:\n", "Data object that wraps (.data):\n", "AnnData object with n_obs × n_vars = 466 × 22088\n", " uns: 'dance_config'\n", " obsm: 'cell_type'\n", "[INFO][2025-08-20 12:27:17,430][dance][wrapped_func] Took 0:00:00.723811 to load and process data.\n", "[INFO][2025-08-20 12:27:17,432][dance][_sanitize_params] Params plan:\n", "\u001b[92m[{'max_val': 98, 'min_val': 4, 'mode': 'sum'},\n", " {'eps': 0.1, 'mode': 'minmax'},\n", " {'method': 'scmap', 'num_genes': 7435},\n", " {'algorithm': 'arpack', 'n_components': 793},\n", " None]\u001b[0m\n", "[WARNING][2025-08-20 12:27:17,442][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data\n", "[WARNING][2025-08-20 12:27:17,448][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data\n", "/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.\n", " warnings.warn(\"Expecting count data as input, but the input feature matrix does not appear to be count.\"\n", "[INFO][2025-08-20 12:27:17,628][dance][_filter_scmap] Start generating cell features using scmap\n", "[WARNING][2025-08-20 12:27:17,662][dance.CellSVD][__call__] n_components=793 must be between 0 and min(n_samples, n_features)=466 with svd_solver='full'\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "282db31ff00d4b2a8a5703f0a9882a78", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\\r'), FloatProgress(value=1.0, max=1.0)))" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View run clear-sweep-1 at: https://wandb.ai/xzy11632/dance-dev/runs/jhryorbj
Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Find logs at: ./wandb/run-20250820_122705-jhryorbj/logs" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "Run jhryorbj errored:\n", "Traceback (most recent call last):\n", " File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/wandb/agents/pyagent.py\", line 308, in _run_job\n", " self._function()\n", " File \"/tmp/ipykernel_715844/3979844991.py\", line 88, in evaluate_pipeline\n", " preprocessing_pipeline(data)\n", " File \"/home/zyxing/dance/dance/pipeline.py\", line 128, in __call__\n", " return self.functional(*args, **kwargs)\n", " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", " File \"/home/zyxing/dance/dance/pipeline.py\", line 247, in bounded_functional\n", " a(*args, **kwargs)\n", " File \"/home/zyxing/dance/dance/pipeline.py\", line 128, in __call__\n", " return self.functional(*args, **kwargs)\n", " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", " File \"/home/zyxing/dance/dance/utils/wrappers.py\", line 128, in new_call\n", " return original_call(self, data, *args, **kwargs)\n", " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", " File \"/home/zyxing/dance/dance/transforms/cell_feature.py\", line 275, in __call__\n", " cell_feat = svd.fit_transform(feat)\n", " ^^^^^^^^^^^^^^^^^^^^^^^\n", " File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/utils/_set_output.py\", line 157, in wrapped\n", " data_to_wrap = f(self, X, *args, **kwargs)\n", " ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", " File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/base.py\", line 1152, in wrapper\n", " return fit_method(estimator, *args, **kwargs)\n", " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", " File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_truncated_svd.py\", line 234, in fit_transform\n", " U, Sigma, VT = svds(X, k=self.n_components, tol=self.tol, v0=v0)\n", " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", " File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/scipy/sparse/linalg/_eigen/_svds.py\", line 438, in svds\n", " args = _iv(A, k, ncv, tol, which, v0, maxiter, return_singular_vectors,\n", " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", " File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/scipy/sparse/linalg/_eigen/_svds.py\", line 44, in _iv\n", " raise ValueError(message)\n", "ValueError: `k` must be an integer satisfying `0 < k < min(A.shape)`.\n", "\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m Run jhryorbj errored:\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m Traceback (most recent call last):\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/wandb/agents/pyagent.py\", line 308, in _run_job\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m self._function()\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/tmp/ipykernel_715844/3979844991.py\", line 88, in evaluate_pipeline\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m preprocessing_pipeline(data)\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/home/zyxing/dance/dance/pipeline.py\", line 128, in __call__\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m return self.functional(*args, **kwargs)\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/home/zyxing/dance/dance/pipeline.py\", line 247, in bounded_functional\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m a(*args, **kwargs)\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/home/zyxing/dance/dance/pipeline.py\", line 128, in __call__\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m return self.functional(*args, **kwargs)\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/home/zyxing/dance/dance/utils/wrappers.py\", line 128, in new_call\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m return original_call(self, data, *args, **kwargs)\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/home/zyxing/dance/dance/transforms/cell_feature.py\", line 275, in __call__\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m cell_feat = svd.fit_transform(feat)\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m ^^^^^^^^^^^^^^^^^^^^^^^\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/utils/_set_output.py\", line 157, in wrapped\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m data_to_wrap = f(self, X, *args, **kwargs)\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/base.py\", line 1152, in wrapper\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m return fit_method(estimator, *args, **kwargs)\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_truncated_svd.py\", line 234, in fit_transform\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m U, Sigma, VT = svds(X, k=self.n_components, tol=self.tol, v0=v0)\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/scipy/sparse/linalg/_eigen/_svds.py\", line 438, in svds\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m args = _iv(A, k, ncv, tol, which, v0, maxiter, return_singular_vectors,\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/scipy/sparse/linalg/_eigen/_svds.py\", line 44, in _iv\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m raise ValueError(message)\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m ValueError: `k` must be an integer satisfying `0 < k < min(A.shape)`.\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m \n", "\u001b[34m\u001b[1mwandb\u001b[0m: Agent Starting Run: 9bwt95uk with config:\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.max_val: 95\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.min_val: 6\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.mode: var\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.1.ColumnSumNormalize.eps: 0.5\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.1.ColumnSumNormalize.mode: normalize\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.2.FilterGenesRegression.method: scmap\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.2.FilterGenesRegression.num_genes: 1609\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.3.CellSVD.algorithm: randomized\n", "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.3.CellSVD.n_components: 539\n", "Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "a8d2975e6c5646c2b315077f5f2ba22a", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Label(value='Waiting for wandb.init()...\\r'), FloatProgress(value=0.01111244439250893, max=1.0)…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "wandb version 0.21.1 is available! To upgrade, please run:\n", " $ pip install wandb --upgrade" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Tracking run with wandb version 0.16.3" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Run data is saved locally in /home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122733-9bwt95uk" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Syncing run fallen-sweep-2 to Weights & Biases (docs)
Sweep page: https://wandb.ai/xzy11632/dance-dev/sweeps/cyuki0fw" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View project at https://wandb.ai/xzy11632/dance-dev" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View sweep at https://wandb.ai/xzy11632/dance-dev/sweeps/cyuki0fw" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View run at https://wandb.ai/xzy11632/dance-dev/runs/9bwt95uk" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "[INFO][2025-08-20 12:27:44,260][dance][set_seed] Setting global random seed to 10\n", "[INFO][2025-08-20 12:27:44,262][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv\n", "[INFO][2025-08-20 12:27:44,657][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv\n", "[INFO][2025-08-20 12:27:44,802][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv\n", "[INFO][2025-08-20 12:27:44,806][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv\n", "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.\n", " warnings.warn(\n", "[INFO][2025-08-20 12:27:44,982][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088\n", "[INFO][2025-08-20 12:27:44,983][dance][_load_raw_data] Number of training samples: 262\n", "[INFO][2025-08-20 12:27:44,984][dance][_load_raw_data] Number of valid samples: 66\n", "[INFO][2025-08-20 12:27:44,984][dance][_load_raw_data] Number of testing samples: 138\n", "[INFO][2025-08-20 12:27:44,985][dance][_load_raw_data] Cell-types (n=9):\n", "['OPC',\n", " 'astrocytes',\n", " 'endothelial',\n", " 'fetal_quiescent',\n", " 'fetal_replicating',\n", " 'hybrid',\n", " 'microglia',\n", " 'neurons',\n", " 'oligodendrocytes']\n", "[INFO][2025-08-20 12:27:44,987][dance][load_data] Raw data loaded:\n", "Data object that wraps (.data):\n", "AnnData object with n_obs × n_vars = 466 × 22088\n", " uns: 'dance_config'\n", " obsm: 'cell_type'\n", "[INFO][2025-08-20 12:27:44,988][dance][wrapped_func] Took 0:00:00.726977 to load and process data.\n", "[INFO][2025-08-20 12:27:44,990][dance][_sanitize_params] Params plan:\n", "\u001b[92m[{'max_val': 95, 'min_val': 6, 'mode': 'var'},\n", " {'eps': 0.5, 'mode': 'normalize'},\n", " {'method': 'scmap', 'num_genes': 1609},\n", " {'algorithm': 'randomized', 'n_components': 539},\n", " None]\u001b[0m\n", "[WARNING][2025-08-20 12:27:45,000][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data\n", "[WARNING][2025-08-20 12:27:45,005][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data\n", "/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.\n", " warnings.warn(\"Expecting count data as input, but the input feature matrix does not appear to be count.\"\n", "[INFO][2025-08-20 12:27:45,232][dance][_filter_scmap] Start generating cell features using scmap\n", "[WARNING][2025-08-20 12:27:45,256][dance.CellSVD][__call__] n_components=539 must be between 0 and min(n_samples, n_features)=466 with svd_solver='full'\n", "[INFO][2025-08-20 12:27:46,636][dance.CellSVD][__call__] Generating cell SVD features (466, 1609) (k=466)\n", "[INFO][2025-08-20 12:27:46,639][dance.CellSVD][__call__] Top 10 explained variances: [0.01398817 0.01243401 0.00987709 0.00972114 0.00953149 0.00923472\n", " 0.00884004 0.00859845 0.00861041 0.00851821]\n", "[INFO][2025-08-20 12:27:46,640][dance.CellSVD][__call__] Total explained variance: 100.00%\n", "[INFO][2025-08-20 12:27:46,641][dance][set_config_from_dict] Setting config 'feature_channel' to 'feature.cell'\n", "[INFO][2025-08-20 12:27:46,642][dance][set_config_from_dict] Setting config 'label_channel' to 'cell_type'\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "016062ed0cd14266a2e4a296eb2b6595", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\\r'), FloatProgress(value=1.0, max=1.0)))" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "

Run history:


acc
test_acc
train_acc

Run summary:


acc0.36364
test_acc0.06522
train_acc0.67176

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " View run fallen-sweep-2 at: https://wandb.ai/xzy11632/dance-dev/runs/9bwt95uk
Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Find logs at: ./wandb/run-20250820_122733-9bwt95uk/logs" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Example main.py\n", "\n", "import argparse\n", "import gc\n", "import os\n", "import pprint\n", "import random\n", "import sys\n", "from pathlib import Path\n", "from typing import get_args\n", "\n", "from dance.registry import register_preprocessor\n", "from dance.transforms.base import BaseTransform\n", "import torch\n", "import wandb\n", "import numpy as np\n", "\n", "from dance import logger\n", "from dance.datasets.singlemodality import CellTypeAnnotationDataset # your dataset\n", "from dance.pipeline import PipelinePlaner, get_step3_yaml, run_step3, save_summary_data\n", "from dance.utils import set_seed\n", "from dance.typing import LogLevel\n", "from sklearn.random_projection import GaussianRandomProjection\n", "root_path=str(Path(__file__).resolve().parent) if '__file__' in globals() else Path(\"tutorial.ipynb\").resolve().parent\n", "\n", "# Import your custom SVM class\n", "# In reality, you'd do: from your_svm_file import SVM\n", "# from your_svm_file import SVM\n", "\n", "\n", "def main(args=None):\n", " #Step 2: Parsing Arguments and configuring hyperparameters\n", " parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)\n", " parser.add_argument(\"--cache\", action=\"store_true\", help=\"Cache processed data.\")\n", " parser.add_argument(\"--dense_dim\", type=int, default=400, help=\"dim of PCA\")\n", " parser.add_argument(\"--gpu\", type=int, default=0, help=\"GPU id, set to -1 for CPU\")\n", " parser.add_argument(\"--log_level\", type=str, default=\"INFO\", choices=get_args(LogLevel))\n", " parser.add_argument(\"--species\", default=\"human\")\n", " parser.add_argument(\"--test_dataset\", nargs=\"+\", default=[138], type=int, help=\"list of dataset id\")\n", " parser.add_argument(\"--tissue\", default=\"Brain\") # TODO: Add option for different tissue name for train/test\n", " parser.add_argument(\"--train_dataset\", nargs=\"+\", default=[328], type=int, help=\"list of dataset id\")\n", " parser.add_argument(\"--valid_dataset\", nargs=\"+\", default=None, type=int, help=\"list of dataset id\")\n", " parser.add_argument(\"--tune_mode\", default=\"pipeline_params\", choices=[\"pipeline\", \"params\", \"pipeline_params\"])\n", " parser.add_argument(\"--seed\", type=int, default=10)\n", " parser.add_argument(\"--count\", type=int, default=2)\n", " parser.add_argument(\"--sweep_id\", type=str, default=None)\n", " parser.add_argument(\"--summary_file_path\", default=\"results/pipeline/best_test_acc.csv\", type=str)\n", " parser.add_argument(\"--root_path\", default=root_path, type=str)\n", " if args is None:\n", " args = parser.parse_args()\n", " else:\n", " args = parser.parse_args(args)\n", "\n", " # Construct the path to the tuning config file\n", " file_root_path = Path(\n", " args.root_path, \"_\".join([\n", " \"-\".join([str(num) for num in dataset])\n", " for dataset in [args.train_dataset, args.valid_dataset, args.test_dataset] if dataset is not None\n", " ])).resolve()\n", " logger.info(f\"\\n files is saved in {file_root_path}\")\n", "\n", " # Instantiate pipeline planer from config file\n", " pipeline_planer = PipelinePlaner.from_config_file(f\"{file_root_path}/{args.tune_mode}_tuning_config.yaml\")\n", " os.environ[\"WANDB_AGENT_MAX_INITIAL_FAILURES\"] = \"2000\"\n", "\n", " #Step 3: define evaluation function\n", " def evaluate_pipeline(tune_mode=args.tune_mode, pipeline_planer=pipeline_planer):\n", " \"\"\"\n", " The evaluation function used by wandb_sweep_agent.\n", " It:\n", " 1. Loads data.\n", " 2. Applies the pipeline.\n", " 3. Trains and scores the model.\n", " 4: Evaluate model\n", " 5. Logs metric(s) to wandb.\n", " \"\"\"\n", " wandb.init(settings=wandb.Settings(start_method='thread'))\n", " set_seed(args.seed)\n", "\n", " # Load dataset\n", " data = CellTypeAnnotationDataset(train_dataset=args.train_dataset, test_dataset=args.test_dataset,\n", " valid_dataset=args.valid_dataset, species=args.species, tissue=args.tissue,\n", " data_dir=\"../temp_data\").load_data()\n", "\n", " # Preprocessing pipeline\n", " kwargs = {tune_mode: dict(wandb.config)}\n", " preprocessing_pipeline = pipeline_planer.generate(**kwargs)\n", " preprocessing_pipeline(data)\n", "\n", " # Retrieve training / testing data\n", " x_train, y_train = data.get_train_data()\n", " y_train_converted = y_train.argmax(1)\n", " x_valid, y_valid = data.get_val_data()\n", " x_test, y_test = data.get_test_data()\n", "\n", " #Initialize our custom SVM model and train\n", " # from your_svm_file import SVM # Place your SVM import here\n", " model = SVM(args, random_state=args.seed)\n", " model.fit(x_train, y_train_converted)\n", "\n", " #Evaluate model\n", " train_score = model.score(x_train, y_train)\n", " score = model.score(x_valid, y_valid)\n", " test_score = model.score(x_test, y_test)\n", "\n", " #Log results to wandb\n", " wandb.log({\"train_acc\": train_score, \"acc\": score, \"test_acc\": test_score})\n", " wandb.finish()\n", "\n", " # Step 4: Run the sweep\n", " entity, project, sweep_id = pipeline_planer.wandb_sweep_agent(\n", " evaluate_pipeline, sweep_id=args.sweep_id, count=args.count) \n", "\n", " #Step 5: Save summary data (top results, etc.)\n", " save_summary_data(entity, project, sweep_id, summary_file_path=args.summary_file_path, root_path=file_root_path)\n", "\n", " # Optionally, handle pipeline + parameter search steps\n", " if args.tune_mode == \"pipeline\" or args.tune_mode == \"pipeline_params\":\n", " get_step3_yaml(result_load_path=f\"{args.summary_file_path}\", step2_pipeline_planer=pipeline_planer,\n", " conf_load_path=f\"{Path(args.root_path).resolve().parent}/step3_default_params.yaml\",\n", " root_path=file_root_path)\n", " if args.tune_mode == \"pipeline_params\":\n", " run_step3(file_root_path, evaluate_pipeline, tune_mode=\"params\", step2_pipeline_planer=pipeline_planer)\n", "if __name__ == \"__main__\":\n", " import os\n", " # os.environ[\"http_proxy\"] = \"http://121.250.209.147:7890\"\n", " # os.environ[\"https_proxy\"] = \"http://121.250.209.147:7890\"\n", " main([])\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Auto-Search Configuration\n", "\n", "The **configuration files** (e.g., `pipeline_params_tuning_config.yaml`, `pipeline_tuning_config.yaml`, `params_tuning_config.yaml`) guide the auto-search. Each file contains instructions for how to vary your preprocessing pipeline or model hyperparameters (or both). For example:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#pipeline_params_tuning_config.yaml\n", "```yaml\n", "type: preprocessor\n", "tune_mode: pipeline_params\n", "pipeline_tuning_top_k: 2\n", "parameter_tuning_freq_n: 2\n", "pipeline:\n", " - type: filter.gene\n", " include:\n", " - FilterGenesPercentile\n", " - FilterGenesScanpyOrder\n", " - FilterGenesPlaceHolder\n", " default_params:\n", " FilterGenesScanpyOrder:\n", " order: [\"min_counts\", \"min_cells\", \"max_counts\", \"max_cells\"]\n", " min_counts: 1\n", " max_counts: 134732\n", " min_cells: 1\n", " max_cells: 401\n", " - type: normalize\n", " include:\n", " - ScaleFeature\n", " - ScTransform\n", " - Log1P\n", " - NormalizeTotal\n", " - NormalizePlaceHolder\n", " default_params:\n", " ScTransform:\n", " processes_num: 8\n", " - type: filter.gene\n", " include:\n", " # - HighlyVariableGenesLogarithmizedByMeanAndDisp\n", " - HighlyVariableGenesRawCount\n", " - HighlyVariableGenesLogarithmizedByTopGenes\n", " - FilterGenesTopK\n", " - FilterGenesRegression\n", " # - FilterGenesNumberPlaceHolder\n", " default_params:\n", " FilterGenesTopK:\n", " num_genes: 100\n", " FilterGenesRegression:\n", " num_genes: 100\n", " HighlyVariableGenesRawCount:\n", " n_top_genes: 100\n", " HighlyVariableGenesLogarithmizedByTopGenes:\n", " n_top_genes: 100\n", " - type: feature.cell\n", " include:\n", " - WeightedFeaturePCA\n", " - WeightedFeatureSVD\n", " - CellPCA\n", " - CellSVD\n", " - GaussRandProjFeature # Registered custom preprocessing func\n", " - FeatureCellPlaceHolder\n", " params:\n", " out: feature.cell\n", " log_level: INFO\n", " - type: misc\n", " target: SetConfig\n", " params:\n", " config_dict:\n", " feature_channel: feature.cell\n", " label_channel: cell_type\n", "wandb:\n", " entity: xzy11632\n", " project: dance-dev\n", " method: grid #try grid to provide a comprehensive search\n", " metric:\n", " name: acc # val/acc\n", " goal: maximize\n", "\n", "\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Tips**:\n", "\n", "1. In `tune_mode=pipeline`, the system will only tune the preprocessing pipeline. \n", "2. In `tune_mode=params`, the system will only tune the model parameters. \n", "3. In `tune_mode=pipeline_params`, the system will do a two-stage search: first for pipelines, then for model parameters.\n", "\n", "---\n", "\n", "## 4. Testing & Execution\n", "\n", "After setting everything up:\n", "\n", "```bash\n", "# Search only the best preprocessing pipeline:\n", "python main.py --tune_mode pipeline\n", "\n", "# Search only the best model hyperparameters:\n", "python main.py --tune_mode params\n", "\n", "# Joint two-stage search for both pipeline and parameters:\n", "python main.py --tune_mode pipeline_params\n", "```\n", "\n", "Once this completes, you should see results logged into Weights & Biases (wandb). The save_summary_data function writes out a CSV of the top performing runs. If you selected pipeline_params, the script also generates a default param config for the second stage of the search, which is automatically run via run_step3." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Summary\n", "By following these steps:\n", "\n", "Inherit from the appropriate base class (in our case BaseClassificationMethod).\n", "Implement the fit, predict, and (optionally) preprocessing_pipeline methods.\n", "Integrate your custom model into the main.py script.\n", "Create and reference the necessary configuration (YAML) files.\n", "Run the pipeline using --tune_mode (pipeline|params|pipeline_params).\n", "…you can easily plug in any custom algorithm—ranging from simple classification methods like an SVM to deep learning methods with pretraining steps—into this auto-search framework.\n", "\n", "Happy coding and good luck with your hyperparameter searches!\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "dance", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.8" } }, "nbformat": 4, "nbformat_minor": 2 }